QARTOD - NetCDF Examples¶
This notebook provides examples of running QARTOD on a netCDF file. For background, see NcQcConfig Usage in the docs.
There are multiple ways that you can integrate ioos_qc into your netcdf-based workflow.
Option A: Store test configurations externally, pass your configuration and netcdf file to ioos_qc, and manually update netcdf variables with results of the test * In this case, you extract variables from the netcdf file, use ioos_qc methods to run tests, and then manually update the netcdf file with results * This provides the most control, but doesn’t take advantage of shared code in the ioos_qc library * It’s up to you to ensure your resulting netcdf is self-describing and
CF-compliant
Option B: Store test configurations externally, then pass your configuration and netcdf file to ioos_qc, and let it run tests and update the file with results * This takes advantage of ioos_qc code to store results and configuration in the netCDF file, and ensure a self-describing, CF-compliant file * Managing your test configurations outside the file is better when dealing with a large number of datasets/configurations
Option C: Store test configurations in your netcdf file, then pass that file to ioos_qc and let it run tests and update the file with results * You only need to add test configurations to the file one time, and after that you could run tests over and over again on the same file * This option is the most portable, since the data, configuration, and results are all in one place * The downside is, test configuration management is difficult since it’s stored in the file instead of some
common external location
[1]:
# Setup directories
from pathlib import Path
basedir = Path().absolute()
libdir = basedir.parent.parent.parent
# Other imports
import pandas as pd
import numpy as np
import xarray as xr
from datetime import datetime
import netCDF4 as nc4
import tempfile
import os
import shutil
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file, output_notebook
output_notebook()
[2]:
# # Install QC library
# !pip install git+git://github.com/ioos/ioos_qc.git
# # Alternative installation (install specific branch):
# !pip uninstall -y ioos_qc
# !pip install git+git://github.com/ioos/ioos_qc.git@BRANCHNAME
# Alternative installation (run with local updates):
!pip uninstall -y ioos_qc
import sys
sys.path.append(str(libdir))
from ioos_qc.config import NcQcConfig
from ioos_qc import qartod
Found existing installation: ioos-qc 0.2.1
Uninstalling ioos-qc-0.2.1:
Successfully uninstalled ioos-qc-0.2.1
Load the netCDF dataset¶
The example netCDF dataset is a pCO2 sensor from the Ocean Observatories Initiative (OOI) Coastal Endurance Inshore Surface Mooring instrument frame at 7 meters depth located on the Oregon Shelf break.
[3]:
filename = basedir.joinpath('pco2_netcdf_example.nc')
pco2 = xr.open_dataset(filename)
[4]:
for dim in pco2.dims:
print(dim)
spectrum
time
[5]:
for var in pco2.variables:
print(var)
obs
time
deployment
id
dcl_controller_timestamp
driver_timestamp
ingestion_timestamp
internal_timestamp
light_measurements
passed_checksum
port_timestamp
preferred_timestamp
provenance
record_time
record_type
thermistor_raw
unique_id
voltage_battery
absorbance_ratio_434
absorbance_ratio_620
pco2w_thermistor_temperature
absorbance_blank_434
absorbance_blank_620
pco2_seawater
absorbance_ratio_434_qc_executed
absorbance_ratio_434_qc_results
absorbance_ratio_620_qc_executed
absorbance_ratio_620_qc_results
pco2w_thermistor_temperature_qc_executed
pco2w_thermistor_temperature_qc_results
pco2_seawater_qc_executed
pco2_seawater_qc_results
lat
lon
[6]:
# Plot raw data
data=pco2['pco2_seawater']
t = np.array(pco2['time'])
x = np.array(data)
p1 = figure(x_axis_type="datetime", title='pco2_seawater')
p1.grid.grid_line_alpha=0.3
p1.xaxis.axis_label = 'Time'
p1.yaxis.axis_label = data.units
p1.line(t, x)
show(gridplot([[p1]], plot_width=800, plot_height=400))
QC Configuration¶
Here we define the generic config object for multiple QARTOD tests, plus the aggregate/rollup flag.
[7]:
# The key "pco2_seawater" indicates which variable in the netcdf file this config should run against
config = {
'pco2_seawater': {
'qartod': {
'gross_range_test': {
'suspect_span': [200, 2400],
'fail_span': [0, 3000]
},
'spike_test': {
'suspect_threshold': 500,
'fail_threshold': 1000
},
'location_test': {
'bbox': [-124.5, 44, -123.5, 45]
},
'flat_line_test': {
'tolerance': 1,
'suspect_threshold': 3600,
'fail_threshold': 86400
},
'aggregate': {}
}
}
}
[ ]:
[8]:
# Helper method to plot QC results using Bokeh
def plot_ncresults(ncdata, var_name, results, title, test_name):
time = np.array(ncdata.variables['time'])
obs = np.array(ncdata.variables[var_name])
qc_test = results[var_name]['qartod'][test_name]
qc_pass = np.ma.masked_where(qc_test != 1, obs)
num_pass = (qc_test == 1).sum()
qc_suspect = np.ma.masked_where(qc_test != 3, obs)
num_suspect = (qc_test == 3).sum()
qc_fail = np.ma.masked_where(qc_test != 4, obs)
num_fail = (qc_test == 4).sum()
qc_notrun = np.ma.masked_where(qc_test != 2, obs)
p1 = figure(x_axis_type="datetime", title=test_name + ' : ' + title + ' : p/s/f=' + str(num_pass) + '/' + str(num_suspect) + '/' + str(num_fail))
p1.grid.grid_line_alpha=0.3
p1.xaxis.axis_label = 'Time'
p1.yaxis.axis_label = 'Observation Value'
p1.line(time, obs, legend_label='obs', color='#A6CEE3')
p1.circle(time, qc_notrun, size=2, legend_label='qc not run', color='gray', alpha=0.2)
p1.circle(time, qc_pass, size=4, legend_label='qc pass', color='green', alpha=0.5)
p1.circle(time, qc_suspect, size=4, legend_label='qc suspect', color='orange', alpha=0.7)
p1.circle(time, qc_fail, size=6, legend_label='qc fail', color='red', alpha=1.0)
#output_file("qc.html", title="qc example")
show(gridplot([[p1]], plot_width=800, plot_height=400))
Option A: Manually run tests and store results¶
Store test configurations externally, pass your configuration and netcdf file to ioos_qc, and manually update netcdf variables with results of the test
[9]:
# Create NcQcConfig object
# Note: For tests that need tinp, zinp, etc, use args to define the t, x, y, z dimensions
# In this case, we need latitude and longitude for the location test
qc = NcQcConfig(config, lon='lon', lat='lat')
# Run tests
# Note: pass in the path to the file, *not* the netCDF dataset object
results = qc.run(filename)
[10]:
# The results are an OrderedDict, with an entry for each variable and test
results
[10]:
OrderedDict([('pco2_seawater',
OrderedDict([('qartod',
OrderedDict([('gross_range_test',
masked_array(data=[4, 4, 1, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('spike_test',
masked_array(data=[4, 4, 4, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('location_test',
masked_array(data=[1, 1, 1, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('flat_line_test',
array([1, 1, 1, ..., 3, 1, 1])),
('aggregate',
masked_array(data=[4., 4., 4., ..., 3., 1., 1.],
mask=False,
fill_value=1e+20))]))]))])
[11]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'gross_range_test')
[12]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'spike_test')
[13]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'flat_line_test')
[14]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'location_test')
[15]:
# To see overall results, use the aggregate test
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'aggregate')
[16]:
# Store results manually
# This is just a simple example and stores the aggregate test flag as a variable.
# You can expand upon this, or use the ioos_qc library to store the results for you (see subsequent examples)
# Create output file
outfile_a = os.path.join(tempfile.gettempdir(), 'out_a.nc')
shutil.copy(filename, outfile_a)
# Store results
with nc4.Dataset(outfile_a, 'r+') as nc_file:
qc_agg = nc_file.createVariable('qartod_aggregate', 'u1', ('time',), fill_value=2)
qc_agg[:] = results['pco2_seawater']['qartod']['aggregate']
[17]:
# Print results
out_a = xr.open_dataset(outfile_a)
print(out_a['qartod_aggregate'])
<xarray.DataArray 'qartod_aggregate' (time: 7339)>
array([4., 4., 4., ..., 3., 1., 1.], dtype=float32)
Coordinates:
obs (time) int64 ...
* time (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-11T23:35:32.253000192
lat (time) float64 ...
lon (time) float64 ...
Option B¶
Store test configurations externally, then pass your configuration and netcdf file to ioos_qc, and let it run tests and update the file with results
[18]:
# We already have results from the previous run, but re-create them here for completeness
qc = NcQcConfig(config, lon='lon', lat='lat')
results = qc.run(filename)
results
[18]:
OrderedDict([('pco2_seawater',
OrderedDict([('qartod',
OrderedDict([('gross_range_test',
masked_array(data=[4, 4, 1, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('spike_test',
masked_array(data=[4, 4, 4, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('location_test',
masked_array(data=[1, 1, 1, ..., 1, 1, 1],
mask=False,
fill_value=999999,
dtype=uint8)),
('flat_line_test',
array([1, 1, 1, ..., 3, 1, 1])),
('aggregate',
masked_array(data=[4., 4., 4., ..., 3., 1., 1.],
mask=False,
fill_value=1e+20))]))]))])
[19]:
# Create output file
outfile_b = os.path.join(tempfile.gettempdir(), 'out_b.nc')
shutil.copy(filename, outfile_b)
# Use the library to store the results to the netcdf file
qc.save_to_netcdf(outfile_b, results)
[20]:
# Explore results: qc test variables are named [variable_name]_qartod_[test_name]
out_b = xr.open_dataset(outfile_b)
print(out_b)
<xarray.Dataset>
Dimensions: (spectrum: 14, time: 7339)
Coordinates:
obs (time) int64 ...
* time (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-11T23:35:32.253000192
lat (time) float64 ...
lon (time) float64 ...
Dimensions without coordinates: spectrum
Data variables:
deployment (time) int32 ...
id (time) |S64 ...
dcl_controller_timestamp (time) object ...
driver_timestamp (time) datetime64[ns] ...
ingestion_timestamp (time) datetime64[ns] ...
internal_timestamp (time) datetime64[ns] ...
light_measurements (time, spectrum) float32 ...
passed_checksum (time) float32 ...
port_timestamp (time) datetime64[ns] ...
preferred_timestamp (time) object ...
provenance (time) |S64 ...
record_time (time) datetime64[ns] ...
record_type (time) float32 ...
thermistor_raw (time) float32 ...
unique_id (time) float32 ...
voltage_battery (time) float32 ...
absorbance_ratio_434 (time) float32 ...
absorbance_ratio_620 (time) float32 ...
pco2w_thermistor_temperature (time) float64 ...
absorbance_blank_434 (time) float64 ...
absorbance_blank_620 (time) float64 ...
pco2_seawater (time) float64 ...
absorbance_ratio_434_qc_executed (time) float32 ...
absorbance_ratio_434_qc_results (time) float32 ...
absorbance_ratio_620_qc_executed (time) float32 ...
absorbance_ratio_620_qc_results (time) float32 ...
pco2w_thermistor_temperature_qc_executed (time) float32 ...
pco2w_thermistor_temperature_qc_results (time) float32 ...
pco2_seawater_qc_executed (time) float32 ...
pco2_seawater_qc_results (time) float32 ...
pco2_seawater_qartod_gross_range_test (time) int8 ...
pco2_seawater_qartod_spike_test (time) int8 ...
pco2_seawater_qartod_location_test (time) int8 ...
pco2_seawater_qartod_flat_line_test (time) int8 ...
pco2_seawater_qartod_aggregate (time) int8 ...
Attributes:
node: RID16
comment:
publisher_email:
sourceUrl: http://oceanobservatories.org/
collection_method: recovered_host
stream: pco2w_abc_dcl_instrument_recovered
featureType: point
creator_email:
publisher_name: Ocean Observatories Initiative
date_modified: 2019-09-25T13:46:37.152877
keywords:
cdm_data_type: Point
references: More information can be found at http...
Metadata_Conventions: Unidata Dataset Discovery v1.0
date_created: 2019-09-25T13:46:37.152875
id: CE01ISSM-RID16-05-PCO2WB000-recovered...
requestUUID: 2d65febd-22c7-4963-8163-04bf1a8e8d32
contributor_role:
summary: Dataset Generated by Stream Engine fr...
keywords_vocabulary:
institution: Ocean Observatories Initiative
naming_authority: org.oceanobservatories
feature_Type: point
infoUrl: http://oceanobservatories.org/
license:
contributor_name:
uuid: 2d65febd-22c7-4963-8163-04bf1a8e8d32
creator_name: Ocean Observatories Initiative
title: Data produced by Stream Engine versio...
sensor: 05-PCO2WB000
standard_name_vocabulary: NetCDF Climate and Forecast (CF) Meta...
acknowledgement:
Conventions: CF-1.6
project: Ocean Observatories Initiative
source: CE01ISSM-RID16-05-PCO2WB000-recovered...
publisher_url: http://oceanobservatories.org/
creator_url: http://oceanobservatories.org/
nodc_template_version: NODC_NetCDF_TimeSeries_Orthogonal_Tem...
subsite: CE01ISSM
processing_level: L2
history: 2019-09-25T13:46:37.152852 generated ...
Manufacturer: Sunburst Sensors
ModelNumber: SAMI-pCO2
SerialNumber: C0053
Description: pCO2 Water: PCO2W Series B
FirmwareVersion: Not specified.
SoftwareVersion: Not specified.
AssetUniqueID: CGINS-PCO2WB-C0053
Notes: Not specified.
Owner: Not specified.
RemoteResources: []
ShelfLifeExpirationDate: Not specified.
Mobile: False
AssetManagementRecordLastModified: 2019-09-09T20:11:52.941000
time_coverage_start: 2015-10-08T19:35:30.569000
time_coverage_end: 2015-12-31T23:35:30.608000
time_coverage_resolution: P3683.89S
geospatial_lat_min: 44.6601
geospatial_lat_max: 44.6601
geospatial_lat_units: degrees_north
geospatial_lat_resolution: 0.1
geospatial_lon_min: -124.09582
geospatial_lon_max: -124.09582
geospatial_lon_units: degrees_east
geospatial_lon_resolution: 0.1
geospatial_vertical_units: meters
geospatial_vertical_resolution: 0.1
geospatial_vertical_positive: down
DODS.strlen: 36
DODS.dimName: string36
DODS_EXTRA.Unlimited_Dimension: obs
[21]:
# Gross range test
# Note how the config used is stored in the ioos_qc_* variables
out_b['pco2_seawater_qartod_gross_range_test']
[21]:
<xarray.DataArray 'pco2_seawater_qartod_gross_range_test' (time: 7339)>
array([4, 4, 1, ..., 1, 1, 1], dtype=int8)
Coordinates:
obs (time) int64 ...
* time (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-11T23:35:32.253000192
lat (time) float64 ...
lon (time) float64 ...
Attributes:
standard_name: gross_range_test_quality_flag
long_name: Gross Range Test Quality Flag
flag_values: [1 2 3 4 9]
flag_meanings: GOOD UNKNOWN SUSPECT FAIL MISSING
valid_min: 1
valid_max: 9
ioos_qc_config: {"suspect_span": [200, 2400], "fail_span": [0, 3000]}
ioos_qc_module: qartod
ioos_qc_test: gross_range_test
ioos_qc_target: pco2_seawater[22]:
# Aggregate/rollup flag
out_b['pco2_seawater_qartod_aggregate']
[22]:
<xarray.DataArray 'pco2_seawater_qartod_aggregate' (time: 7339)>
array([4, 4, 4, ..., 3, 1, 1], dtype=int8)
Coordinates:
obs (time) int64 ...
* time (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-11T23:35:32.253000192
lat (time) float64 ...
lon (time) float64 ...
Attributes:
standard_name: aggregate_quality_flag
long_name: Aggregate Quality Flag
flag_values: [1 2 3 4 9]
flag_meanings: GOOD UNKNOWN SUSPECT FAIL MISSING
valid_min: 1
valid_max: 9
ioos_qc_config: {}
ioos_qc_module: qartod
ioos_qc_test: aggregate
ioos_qc_target: pco2_seawaterOption C¶
Store test configurations in your netcdf file, then pass that file to ioos_qc and let it run tests and update the file with results.
In the example above, we used the library to store results and config in the netcdf file itself. At this point, we can load that same file and run tests again, without having to re-define config. This is very powerful!
[23]:
# Create a copy of the output from B
outfile_c = os.path.join(tempfile.gettempdir(), 'out_c.nc')
shutil.copy(outfile_b, outfile_c)
# Load this file into the NcQcConfig object
qc = NcQcConfig(outfile_c, lon='lon', lat='lat')
# Run tests and store results
results_c = qc.run(outfile_c)
qc.save_to_netcdf(outfile_c, results_c)
[24]:
# Explore results
out_c = xr.open_dataset(outfile_c)
print(out_c)
<xarray.Dataset>
Dimensions: (spectrum: 14, time: 7339)
Coordinates:
obs (time) int64 ...
* time (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-11T23:35:32.253000192
lat (time) float64 ...
lon (time) float64 ...
Dimensions without coordinates: spectrum
Data variables:
deployment (time) int32 ...
id (time) |S64 ...
dcl_controller_timestamp (time) object ...
driver_timestamp (time) datetime64[ns] ...
ingestion_timestamp (time) datetime64[ns] ...
internal_timestamp (time) datetime64[ns] ...
light_measurements (time, spectrum) float32 ...
passed_checksum (time) float32 ...
port_timestamp (time) datetime64[ns] ...
preferred_timestamp (time) object ...
provenance (time) |S64 ...
record_time (time) datetime64[ns] ...
record_type (time) float32 ...
thermistor_raw (time) float32 ...
unique_id (time) float32 ...
voltage_battery (time) float32 ...
absorbance_ratio_434 (time) float32 ...
absorbance_ratio_620 (time) float32 ...
pco2w_thermistor_temperature (time) float64 ...
absorbance_blank_434 (time) float64 ...
absorbance_blank_620 (time) float64 ...
pco2_seawater (time) float64 ...
absorbance_ratio_434_qc_executed (time) float32 ...
absorbance_ratio_434_qc_results (time) float32 ...
absorbance_ratio_620_qc_executed (time) float32 ...
absorbance_ratio_620_qc_results (time) float32 ...
pco2w_thermistor_temperature_qc_executed (time) float32 ...
pco2w_thermistor_temperature_qc_results (time) float32 ...
pco2_seawater_qc_executed (time) float32 ...
pco2_seawater_qc_results (time) float32 ...
pco2_seawater_qartod_gross_range_test (time) int8 ...
pco2_seawater_qartod_spike_test (time) int8 ...
pco2_seawater_qartod_location_test (time) int8 ...
pco2_seawater_qartod_flat_line_test (time) int8 ...
pco2_seawater_qartod_aggregate (time) int8 ...
Attributes:
node: RID16
comment:
publisher_email:
sourceUrl: http://oceanobservatories.org/
collection_method: recovered_host
stream: pco2w_abc_dcl_instrument_recovered
featureType: point
creator_email:
publisher_name: Ocean Observatories Initiative
date_modified: 2019-09-25T13:46:37.152877
keywords:
cdm_data_type: Point
references: More information can be found at http...
Metadata_Conventions: Unidata Dataset Discovery v1.0
date_created: 2019-09-25T13:46:37.152875
id: CE01ISSM-RID16-05-PCO2WB000-recovered...
requestUUID: 2d65febd-22c7-4963-8163-04bf1a8e8d32
contributor_role:
summary: Dataset Generated by Stream Engine fr...
keywords_vocabulary:
institution: Ocean Observatories Initiative
naming_authority: org.oceanobservatories
feature_Type: point
infoUrl: http://oceanobservatories.org/
license:
contributor_name:
uuid: 2d65febd-22c7-4963-8163-04bf1a8e8d32
creator_name: Ocean Observatories Initiative
title: Data produced by Stream Engine versio...
sensor: 05-PCO2WB000
standard_name_vocabulary: NetCDF Climate and Forecast (CF) Meta...
acknowledgement:
Conventions: CF-1.6
project: Ocean Observatories Initiative
source: CE01ISSM-RID16-05-PCO2WB000-recovered...
publisher_url: http://oceanobservatories.org/
creator_url: http://oceanobservatories.org/
nodc_template_version: NODC_NetCDF_TimeSeries_Orthogonal_Tem...
subsite: CE01ISSM
processing_level: L2
history: 2019-09-25T13:46:37.152852 generated ...
Manufacturer: Sunburst Sensors
ModelNumber: SAMI-pCO2
SerialNumber: C0053
Description: pCO2 Water: PCO2W Series B
FirmwareVersion: Not specified.
SoftwareVersion: Not specified.
AssetUniqueID: CGINS-PCO2WB-C0053
Notes: Not specified.
Owner: Not specified.
RemoteResources: []
ShelfLifeExpirationDate: Not specified.
Mobile: False
AssetManagementRecordLastModified: 2019-09-09T20:11:52.941000
time_coverage_start: 2015-10-08T19:35:30.569000
time_coverage_end: 2015-12-31T23:35:30.608000
time_coverage_resolution: P3683.89S
geospatial_lat_min: 44.6601
geospatial_lat_max: 44.6601
geospatial_lat_units: degrees_north
geospatial_lat_resolution: 0.1
geospatial_lon_min: -124.09582
geospatial_lon_max: -124.09582
geospatial_lon_units: degrees_east
geospatial_lon_resolution: 0.1
geospatial_vertical_units: meters
geospatial_vertical_resolution: 0.1
geospatial_vertical_positive: down
DODS.strlen: 36
DODS.dimName: string36
DODS_EXTRA.Unlimited_Dimension: obs
[ ]: